Introduction

In this presentation we will analize data about Formula One history. We will explore data relating drivers, constructors, countries, circuits, race results etc. We will try to find out which drivers and constructors have dominated this motorsport during these 71 years. In order to evaluate their performance, the presentation proposes a ranking system, based on multiple metrics. The presentation also include some statistics and informations about F1 itself. All the used data are available here and consist of 13 csv files. Sourcecode is available here.

What is Formula One

Formula One (abbreviated to F1) is the highest class of international auto racing for single-seater racing car approved by the Fédération Internationale de l’Automobile (FIA). A Formula One season consists of a series of races, known as Grands Prix, which take place worldwide on purpose-built circuits and on closed public roads.

The results of each race are evaluated using a points system to determine two annual World Championships: one for drivers and one for constructors.

Formula One cars are considered the fastest regulated road-course racing cars in the world, characterized by very high cornering speeds, achieved through the generation of large amounts of aerodynamic downforce. Most modern F1 cars can achieve peaks of 6.5 lateral g while cornering and approximately top speeds of 360km/h. Tracion control and other driving aids have been banned since 2008.

You can learn more on how F1 works by watching this video.

Some history about F1

Formula One has its roots in the European Grand Prix championships of the 1920s and 1930s. The foundation of F1 began in 1946 with the FIA’s standardisations of rules, which was followed by the first World Championships of Drivers in 1950.

The history of Formula One is usually divided into eras, but this division is tipycally based on subjective criterias and there is no real official division. We will dive deep into the history of Formula One by introducing some of the most iconic drivers by querying the data and obtaining important results related to their careers, such as number of wins, number of podiums etc.

By analyzing the data we can also obtain some numbers characterizing the history of F1:

You can learn more on F1 history by watching this video. and this one

First F1 winner

Giuseppe Farina (also known as Giuseppe Antonio “Nino” Farina) won the first F1 race and the first F1 Drivers Championship. Farina drove for Ferrari, Alfa Romeo and Lancia. Let’s see what the data can tell us about this iconic driver:

“Because of the crazy way Farina drove only the Holy Virgin was capable of keeping him on the track.” - Juan Manuel Fangio

Domination of the 50s by Fangio & Ascari

After the win of the first championship by Farina, the 1950s were dominated by two iconic drivers, Juan Manuel Fangio and Alberto Ascari:

Fangio’s records remained unbeaten for over 30 years. 1950s are remembered as the deadliest decade in F1, with 15 casualties. Helmeds became mandatory 2 years after the birth of F1, in 1952. Seatbelts became mandatory in 1972.

The driver of the 1960s

The first driver to beat Fangio’s wins record was Jim Clark, a driver who dominated the circuits in the 1960s.

“Jim Clark was everything I aspired to be, as a racing driver and as a man” - Sir Jackie Stewart

The Flying Scot

Clark’s 25 wins record was broken by his greatest admirer: Sir John Jackie Stewart.

The battle between teammates

It took 14 years to break Stewart’s winning record; a new era began, and the progenitor was Alain Prost.

A famous period of Formula One is the one called the Prost-Senna rivalry. The rivalry between the two pilots was at its most intense during the period in which they were teammates at McLaren (1988-1989).

Ayrton Seanna is probably the most loved driver in F1 history. Senna has distinguished himself throughout his career and was regarded as a prodigy driver who would break any records. Unfortunately Senna died aged 34 after a crash during the San Marino Grand Prix on 1 May 1994.

“Racing, competing, it’s in my blood. It’s part of me, it’s part of my life; I have been doing it all my life and it stands out above everything else.” - Ayrton Senna

The Ferrari Era

One of the most iconic driver, if not the most iconic one, is Michael Schumacher, which during his career broke most of the records of this motorsport, settings a new standard which was beyond imagination.

“I always thought records were there to be broken. - Michael Schumacher”

The Hybrid Era: the birth of a new legend

With the advent of the hybrid era (2014-present) was dominated by Mercedes and by his top driver Lewis Hamilton.

Hamilton is the first one to break Shumacher’s records and is currently considered the best F1 driver.

You can learn more about these iconic driver by watching this video.

Who’s the most dominant driver of all time?

Which driver has dominated the circuits during the history of Formula One? It’s difficult to tell, but data can help us! Often drivers are compared taking into consideration only simple metrics, such as: races won, world titles. Surely these metrics reflect the skill of the driver, however they are superficial evaluation criteria. For example, consider the case in which the driver a has won 100 races out of 500 and driver b has won 70 races out of 120. Would you still say that driver a is better than driver b just because he has won more races? The data collected during these 71 years can help us to build a more advanced evaluation system, based on multiple parameters, which will allow us to rank drivers in a more transparent and objective way. Let’s start by showing some basic drivers summaries which we will use later for evaluating their performance.

The problem with different scoring systems

A significant metric in evaluating the skills of a driver is amount of points cumulated during their career. However, there is a problem: scoring systems have changed over the years, so it is a little unfair to draw too many conclusions by simply considering the sum of total points scored by each driver. In the following plot whe can see how points systems have changed during the history of Formula One.

You can notice the significant step in maximum points available in 2010 when the points changed from 10 points for a win to 25. It’s clear how evaluating drivers on the basis of the total points would be a dishonest method towards the drivers who raced prior to 2010.

A more fair approach is the following: recalculate the points awarded by drivers for each race in the history of Formula 1 by always using the same scoring system. To normalize data we will use the currently used score system (but whithout the fastest lap bonus point rule, introduced in 2019). The points are awarded for the top 10 drivers in the following pattern: 1st gets 25, 2nd gets 18,15,12,10,8,6,4,2,1. The following two graphs show the ranking of the drivers by total points, before and after the data normalization.

As you can see the results still has Hamilton dominating the standings. However, Michael Schumacher is now much closer and a few more drivers from outside of the recent years have started to be included.

The results of the 1047 races for each driver allow us to calculate other useful metrics of performance as well, such as: number of podiums and number of pole positions. In motorsports, the pole position is the position at the inside of the front row at the start of a racing event. This position is typically given to the vehicle and driver with the best qualifying time in the trials before the race (the leader in the starting grid). This number-one qualifying driver is referred to as the pole-sitter. With number of podiums we mean wins+2nd+3rd.

The problem with different number of races

Similarly to the problem previously highlighted, related to the different scoring systems for different years, the metrics illustrated (wins,podiums,pole positions and total points) are also influenced by a variability factor dependent on the years: the number of races for each season has varied over the years. So for example, if driver a partecipated only in a season with 10 races, and driver b partecipated only in a season with 20 races, pilot b had double the chance of winning races and so on. A more fair approach is to consider the rates of each metric, which means working with relative frequencies instead of absolute frequencies . New performance metrics are calculated in order to make the number of held races irrelevant:

  • \(win \; rate = \frac{wins}{held \; races}\)
  • \(podium \; rate = \frac{podiums}{held \; races}\)
  • \(points \; rate = \frac{tot \; points \; normalized}{tot \; points \; potential}\)
  • \(pole \; rate = \frac{pole \; positions}{held \; races}\)

Where tot points potential is equal to 25 times held races. We need to set a requirement in order for a driver to be admitted in the comparison. If a driver won 1 race out of 1, starting in pole positions, he would be placed first in every classification, even if he is an irrelevant driver in the history of F1. This threshold will help us removing outliers/ insignificant values. The requirement to be included in the comparison is to have participated in at least 20 races (about 1 season).

Now that we have isolated the two main factors of variability between different F1 eras, we can define a more reasonable evaluation criterion based on the rates of the metrics illustrated above. The performance metric, which represents a driver’s success and skills during their career, is calculated as follows:

\[performance = \frac{1}{4} * win \; rate+ \frac{1}{4}*podium \; rate + \frac{1}{4} *points \; rate+\frac{1}{4}*pole \; rate\]

The choice of weights is questionable and can be considered an objective choice. We may suppose that assigning the same weights to each metric is a compromise that should not penalize any driver.

Like before, only driver with at least 20 held races are considered in the ranking.

As we can see, on the top of the ranking we have most of the most iconic F1 drivers illustrated in the introduction. Lewis Hamilton is actually the driver with most of the records, but the result we get by reasoning on rates and with normalized data puts Juan Fangio in first position

Drivers of the past had to struggle against mechanical failures

Another factor which can be considered as penalizing for past drivers is that car reliability improved over time, hence modern drivers have an advantage in their performances metrics. Consider this example: driver a and driver b started both 100 races. Driver a is from the 50s and had 40 mechanical failures: his real win potential is 60. Driver b is a recent F1 driver, and had only 5 mechanical failures: his real win potential is 95. A fairer approach would take this issue into account. In order to isolate the reliability factor we can recalcuate rates by using completed races and not the amount of held races. The graph below shows how old cars had a higher probability of having a mechanical failure.

We can clearly see how modern drivers have had, in proportion to the races started, much fewer retirements due to mechanical problems, therefore, they have an advantage in the calculation related to the performance metric. We can fix this problem by introducing a performance multiplier for each driver, calcualted as follow:

\[bonus = \frac{number \; of \; retirements \; due \; to \; mechanical \; failures}{held \; races}\]

And then we can calculate again their performance score, simply by applying the mechanical failures bonus: \(new \; performance = (1+bonus) * performance\). The new ranking is the following:

Interestingly by taking this approach the list is dominated by past drivers and not by modern ones. The choiche to include the reliability (mechanical failures) bonus is questionable: some might say that the number of retirements due to mechanical failures does not totally depend on the car reliability but also on the driver’s ability to drive it, therefore a better analysis should be performed in this regard.

Table with metrics of all drivers with at least 20 races held (250 out of 850):

What if we had to give a flag to F1?

Formula One is an international motorsport, but which country was the most present one during these 71 years? Let’s find out through the data we have available. Checking the nationality of the 850 drivers, 211 manufacturers and 74 circuits, we count a total of 49 nations involved in this motorsport. Let’s visualize the tio 6 countries with the highest number of drivers, manufacturers and used circuits.

For each of these categories we can extract more meaningful information. We count the total number of races disputed by drivers from each nation, constructors’ championships, and the number of race weekends for circuits grouped by nationality. In this way we can measure the presence of each country during these 71 years.

We can already see how United Kingdom has a very high number of drivers and constructors. If we set up a weighting system based on those three metrics we can decide which nation has been most present throughout the history of F1. We will use the following weights:

We give the same weight to the factors representing the participation of drivers and constructors, bu we give a small bonus to the factor representing where grand prixs were held. This is because a F1 event is held not only by drivers and constructors, but by fans, organizers, first aiders, track technicians etc. The score for each country is computed as follow:

\[score = \frac{3}{10}*number \; of \; race \; weekends + \frac{4}{10}*races \; held \; by \; drivers + \frac{3}{10}*seasons \; held \; by \; constructors\] We summarize our results with the following wordcloud where the dimension of each contry depends on its score.

We can cleary see how United Kingdom was the most present country in the history of F1. This result is probably justified by the great amount of drivers and constructors coming from there. A big contribution is also given by the historic Silverstone Circuit, which hosted 55 races. In second place we find Italy. Its score owes much to the first drivers of the 50s, to the presence of Ferrari (the only constructor present in all the championships) and to the Autodromo Nazionale di Monza, currently the circuit with the highest number of races hosted (70).

To get an overall and clearer view, we display our data on a geographic heatmap:

Let’s have a clooser look into Europe:

If you want to learn more on what happens behind the scenes of every grand prix, watch this video

Which country raised the fastest drivers?

Using the data relating to the drivers and their performance, we can compute an estimate of the average performance of drivers from each country. We allow to enter the ranking only to countries that have had at least 15 drivers, and we compute the average by taking into consideration only the 15 best drivers for each of them.

It is curious to note how the ranking is very similar to that of the most present countries. One possible reason is that if a nation has had multiple drivers it is more likely that one of them was very good. It must also be taken into consideration that constructors may prefer to hire drivers of the same nationality, moreover, the constructors themselves often give life to initiatives and projects for the growth and selection of talents in their own territory. Examples are: Ferrari Driver Academy in Italy, McLaren Young Driver Programme in UK.

In the following table we recap the main metrics of the 49 nations that participated in Formula One.

Which is the most successful constructor of all time?

Formula One is not all about drivers, in fact each of them competes for a constructor (racing team) who is responsible for building, designing and researching the car. Obviously the possibility of a driver to win a race or to win the world title also depends on the constructor for which he races. A faster car is better then a slower one! So let’s see which constructors have been the most present and winning ones over the 71 years of Formula One. The following plot illustrate the evolution of the n° of held seasons for the 10 constructors which partecipated the most. Some of them are no longer racing nowadays. It’s worth to note that the graph starts from 1950 but the constructors championship actually has been introduced in 1958.

As we can see Ferrari is on top of the ranking (with 71 seasons out of 71 - in fact it’s the only one constructor persent from day one), in second position we have McLaren with 51 seasons, and then Williams with 45 seasons. Now let’s see who has won the most world titles.

With the following graphs, we summarize the ranking of manufacturers with the highest number of world titles (constructors’s championships) and the relative win rate: (\(\frac{world titles}{participations}\)).

We notice that Brawn GP has as winning rate of 100%. In fact this team has won the only championship in which it has participated (2009).

Very similarly to the drivers case, through data analysis we are able to extract some useful informations. As we did for drivers, for each constructor we calculate the total amount of scored points with normalized data (same scoring system for each year) in order to setup a fair rating system. We will also count held races, wins, podiums and poles and for each of this metric we will calculate the rate (relative frequency) in order to to remove the advantage factor for the teams that participated the most. We will then setup a performance metric as the one for drivers:

\[performance = \frac{1}{4} * win \; rate+ \frac{1}{4}*podium \; rate + \frac{1}{4} *points \; rate+\frac{1}{4}*pole \; rate\]

Similarly to drivers, only constructor with a minimum of 50 held races are allowed in the comparison. The normalization of data related to the mechanical failure rate here is not applied, so there are no bonus to the performance metric. This choice derives from the fact that a construcor with a lower probability of mechanical failure should be rewarded as he is the main responsible for the car. The results are reported in the following diagram:

As we can see the ranking is dominated by Mercedes. This result is attributable to the excellent results obtained in recent years. One thing is for certain, in the last years, Mercedes are doing something that’s never been seen in the 71-year history of the Formula One World Championship. We’ve seen similar periods of dominance in the sport before however, such as McLaren’s 1988 season, where the stunning MP4/4 won 15 of the 16 races at the hands of Ayrton Senna and Alain Prost. Or in 2002, where Michael Schumacher and Rubens Barrichello combined to win 15 of the 17 races, thus sealing Schumi’s historic 5th World Championship, equalling the great Juan Manuel Fangio in the process.

In the following table we recap the main metrics of the 211 constructors that participated in Formula One.

Conclusions

Through data analysis we were able to evaluate the performance of drivers and constructors on the basis of multiple metrics. Those metrics allowed us to define a more sophisticated ranking system which, based on some assumptions, allowed us to isolate some variabilty factors relating to the changes that have occurred over the decades, allowing us to perform a more fair and objective evaluation. We were also able to draw conclusions about countries, discovering which of them partecipated the most and which of these is the homeland of the best drivers.

The proposed ranking systems are based on subjective assumptions and parameters, but in my opinion they can be considered more balanced methods than the classic ones which are based on the simple count of absolute frequencies of basic metrics.

Further improvements

Undoubtedly, the rating systems can be improved, especially the one regards the drivers. For examples, we can consider, for each race, which driver performed the fastest lap and award him with some performance score. We can also evaluate the drivers with respect to the positions gained (position in the starting grid and final position). A very complex metric that could be included is the consistency of lap times. Unfortunately, these metrics require the data related to each lap of each race in the history of F1. These data are only partially provided by the dataset, in fact the dataset provides the data relating to all the laps of only about half of the races; this is the reason why these metrics are not considered here. Their use would have disadvantaged the drivers who participated in the races for which the values are not present.

Another factor that is often a source of discussion among fans is the car’s performance factor. Sometimes a good driver finds himself, unfortunately in non-competitive teams, thus not being able to demonstrate his skills. It must be said, however, that typically the best drivers are hired by the most winning teams, and vice versa. It would be interesting, but at the same time very complex, to try to isolate this factor.

Some curiosities about F1

Estimation of the total km traveled in the races

Formula One and Blockchain